The Whiz and Viz Bang of Data

class: center, middle, inverse, title-slide

.title[
# The Whiz and Viz Bang of Data
]
.subtitle[
## The Basics of Visualizaiton and Modeling
]
.author[
### Dr. Christopher Kenaley
]
.institute[
### Boston College
]
.date[
### 2024/9/16
]

---

class: inverse, top
# In class today

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.14.0/css/all.min.css">

.pull-left[
Today we'll ....

- Review/Learn about the visualization, model choice, and phylogenetic correction

- Look at some models

- Choose which models fit best

- Peak under the hood of Module Project 3

Next time . . .

- Account for phylogenetic history

]

.pull-right[
![](https://miro.medium.com/max/1200/0*MSmfUESNp4eSzNy_)
]

---
class: inverse, top

## What is a model?

- a mathematical explanations of a process or system

- Predictions in R: `y~x`

- but can me more complex:

* `y~x+a`
  * `y~x+a+b`
  * `y~x+a+b+c`
  * etc.
  
- Linear model: `lm(y~x)`

* But could be some other model

---
class: inverse, top

## What is a model?

``` r
set.seed(123)
x.A=1:50
y.A=x.A*2+runif(50,1,200)
x.B=1:50
y.B=x.B*3.5+runif(50,1,200)

d <- tibble(x=c(x.A,x.B),y=c(y.A,y.B),species=c(rep("A",50),rep("B",50)))

d%>%
  ggplot(aes(x,y,col=species))+geom_point()+geom_smooth(method="lm")
```

```
## `geom_smooth()` using formula = 'y ~ x'
```

![](3140_f24_9-16_files/figure-html/unnamed-chunk-2-1.png)

---
class: inverse, top

## Are models accurate descriptions of the process/system?

``` r
spec.lm1 <- lm(y~x+species,data=d)

anova(spec.lm1)
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## x          1 103506  103506 29.5261 4.099e-07 ***
## species    1  22023   22023  6.2823   0.01386 *  
## Residuals 97 340040    3506                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
class: inverse, top

## Are models accurate descriptions of the process/system?

``` r
summary(spec.lm1)
```

```
## 
## Call:
## lm(formula = y ~ x + species, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -116.94  -47.00   -3.31   50.33  115.69 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  98.6482    13.4004   7.362 5.94e-11 ***
## x             2.2294     0.4103   5.434 4.10e-07 ***
## speciesB     29.6803    11.8416   2.506   0.0139 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59.21 on 97 degrees of freedom
## Multiple R-squared:  0.2696,	Adjusted R-squared:  0.2546 
## F-statistic:  17.9 on 2 and 97 DF,  p-value: 2.41e-07
```

---
class: inverse, top

## Are models accurate descriptions of the process/system?

## Information theory

.pull-left[

``` r
spec.lm2 <- lm(y~x*species,d)
anova(spec.lm2)
```

```
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## x          1 103506  103506  32.631 1.247e-07 ***
## species    1  22023   22023   6.943  0.009812 ** 
## x:species  1  35530   35530  11.201  0.001168 ** 
## Residuals 96 304510    3172                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
]

---
class: inverse, top

## Are models accurate descriptions of the process/system?

## Information theory

.pull-left[

``` r
AIC(spec.lm1,spec.lm2)
```

```
##          df      AIC
## spec.lm1  4 1104.953
## spec.lm2  5 1095.917
```
![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*qzNtzGi7HyVXmxrfOcuaww.png)

]

.pull-right[
![](https://timeseriesreasoning.files.wordpress.com/2021/06/a6352-1nurn_wtjfpwin0mc6t7myq.png)
]